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Abstract —In this paper, we handle the problem of efficient 
user-mobility driven macro-cell planning in cellular networks. As 
cellular networks embrace heterogeneous technologies (including 
long range 3G/4G and short range WiFi, Femto-cells, etc.), most 
traffic generated by static users gets absorbed by the short-range 
technologies, thereby increasingly leaving mobile user traffic to 
macro-cells. To this end, we consider a novel approach that 
factors in the trajectories of mobile users as well as the impact of 
city geographies and their associated road networks for macro¬ 
cell planning. Given a budget k of base-stations that can be 
upgraded, our approach selects a deployment that improves 
the most number of user trajectories. The generic formulation 
incorporates the notion of quality of service of a user trajectory as 
a parameter to allow different application-specific requirements, 
and operator choices. We show that the proposed trajectory utility 
maximization problem is NP-hard, and design multiple heuristics. 
To demonstrate their efficacy, we evaluate our algorithms with 
real and synthetic datasets emulating different city geographies. 
For instance, with an upgrade budget k of 20%, our algorithms 
perform 3-8 times better in improving the user quality of service 
on trajectories when compared to greedy location-based base- 
station upgrades. 


I. Introduction 

As cellular networks advance towards providing high- 
bandwidth services to a large subscriber base, the revenue 
growth for cellular network operators is significantly slow¬ 
ing down On one hand, operators are making heavy 
investments to upgrade the network to cater to the growing 
bandwidth demands. On the other hand, the revenue per byte 
is decreasing because of increased competition and demand for 
cheaper services. Consequently, to keep network deployment 
costs low, cellular operators are increasingly focusing on short- 
range technologies, such as Small-cells and Femto-cells, for 
meeting the high-bandwidth demands from customers @,0. 

While short-range technologies provide high bandwidth to 
static users at a much cheaper cost per byte, long-range 
networks are more appropriate to provide continuous coverage. 
Hence, as short-range networks absorb static user traffic, 
macro-cells will increasingly handle mobile user traffic. As 
a result, it becomes important to consider mobility patterns of 
users for effective macro-cell upgrades. For example, users are 


accessing a variety of applications such as YouTube and maps 
when mobile. These applications require high bandwidth and 
delay guarantees to ensure high quality of experience (QoE). 
One recent survey indicates that data traffic increases by 20- 
30% during busy commute hours j4j. Hence, it is imperative 
for the operators to plan macro-cell upgrades to maximize the 
quality of experience for mobile subscribers. 

To the best of our knowledge, no current approach, either 
in research literature or in practice, considers user mobil¬ 
ity trajectories and the experience perceived by users for 
macro-cell upgrades. Operators currently perform incremental 
upgrade of a few base-stations to newer technology while 
installing cheaper older technology base-station in areas with 
low demand. For example, a major operator, Airtel in India, 
deployed 6,728 new 3G sites (27% year-on-year growth), while 
also deploying 4,977 new 2G sites in 2013-2014 [|5]]. 

Operators deploy higher generations of technology based 
on the anticipated static user population. This often leads to 
switching between base-stations of multiple generations of 
technologies on a mobile user trajectory, thereby leading to 
degraded quality of experience for mobile users. For instance, 
a mobile 4G user on a given daily commute trajectory may 
often handoff from a 4G cell to a 3G/2G cell, depending on 
the current deployment. 

In this paper, we focus on explicitly incorporating the 
experience of mobile users on their trajectories for incremental 
upgrade of cellular networks from one generation to another. 
Note that upgrades often happen at cell-towers that already 
have a previous generation of the technology deployed, mainly 
to keep real-estate costs for cell-sites (including land, room, 
grid connection, diesel generator, backhaul provisioning, etc.) 
low. Hence, we focus on the problem of identifying base- 
stations that need to change from one generation to another, 
and leave RF-level planning 0 0 as a follow-up task that 
field engineers perform at the towers identified for upgrades. 

Specifically, we tackle the following budget-constrained 
Trajectory Utility Maximization Problem (TUMP): Given a 
macro-cell network with n base-stations and quality of ex¬ 
perience of mobile users when connected to a sequence of 
base-stations, how do we choose k base-stations such that the 
overall mobile user experience is maximized? 
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TABLE I. Route Characteristics 


Route 

Drive Time 

Num Trajectories 

Num BS 

Approx start hour 

RT-1 

26 hours 

13 

100 

8:00 AM 

RT-2 

25 hours 

18 

76 

6:15 PM 

RT-3 

13 hours 

7 

56 

7:45 PM 

TOTAL 

64 hours 

38 

158 



The key idea of the TUMP problem is to: (1) consider 
the performance perceived by users on their trajectories using 
call/transaction records, (2) identify the base-stations that 
provide lower QoS on each trajectory, and (3) deploy higher 
capacity base-stations such that maximum number of trajecto¬ 
ries are bottleneck-free. 

We make the following contributions: 

1) To the best of our knowledge, this is the first paper 
to propose cellular macro-network planning by con¬ 
sidering users’ trajectories. We develop an extensible 
framework (TUMP) for optimizing mobile user experi¬ 
ence for different trajectory utility functions that capture 
different application QoS requirements. 

2) We show that TUMP is NP-hard. We then present 
two heuristics: INC-GREEDY and DEC-GREEDY. We 
prove bounds on their efficacy, and show that they can 
be incrementally applied to an evolving network; i.e., as 
and when the operator allocates additional budget, these 
algorithms can be applied to incrementally evolve the 
network from one generation to another. Our techniques 
enable operators to satisfy 3-8 times more number 
of mobile users than an approach that uses a greedy 
location-based base-station upgrade. 

3) We measure and analyze the existing experiences of 
mobile users during their everyday commute. We show 
that around 50% of the base-stations during commute 
provide throughputs that cannot cater to a medium sized 
video. Users often experience long stretches of time that 
degrade user experience. 

4) We develop a network trace generator from how people 
move in large cities. We believe that the trace generator 
is useful for research beyond this paper since cellular 
network traces are most often not published openly by 
the operators. Through these traces, we show that the 
investment required to provide satisfactory QoS to mo¬ 
bile users is dependent on the population distributions 
and their road-network. Specifically, cities with a dense 
central business districts, such as New York, need less 
budget to satisfy a large segment of mobile users than 
cities where businesses are spread out (e.g., Atlanta). 


The rest of the paper is organized as follows. We describe 
a measurement-based analysis of the problem in Section [IT] 
Section [III] discusses the proposed trajectory utility maximiza¬ 
tion problem (TUMP). Section [IV] analyzes the approximation 
algorithms and their bounds. In Section [V] and Section VI 


we evaluate the efficacy of the algorithms under different 
parame ter se ttings. Section describes related work while 
Section I yUTI discusses avenues for future work and concludes. 



BS# 

Fig. 1. Throughput distribution on base-stations 


II. A Measurement-based Motivation 

We conducted measurement-based experiments to quantify 
user experiences during mobility. We developed an Android 
app that measures the throughput while users are traveling on 
their daily trajectories. The users have a 3G data connection. 
We demonstrate that the users often experience prolonged peri¬ 
ods of non-satisfactory download rates along their trajectories. 

Our measurements are aware of the data limits imposed 
on the cellular plan. Hence, our app was tuned to: (1) 
sample throughput with a high-periodicity of approximately 
five minutes (only when the user is traveling); and, (2) each 
sample consists of downloading chunks of 50 kB for 5 times 
consecutively; the size of each sample was chosen such that 
there are at least 30 TCP packets (to accommodate TCP’s 
startup delays). For each sample, we record the base-station 
id, technology of the base-station, and GPS coordinates of the 
user. 

We collected over 20 days of driving data on three main 
routes. Each route is around 25 Km, and contains multiple 
trajectory samples over different days. Table [I] summarizes the 
different routes that were taken by the users. 

We observed 158 unique base-stations, out of which 83 had 
20 or more samples. Fig. |T] shows the throughput statistics 
on these 83 base-stations. All the base-stations, except the 
last two, were indicated as 3G base-stations; the last two 
were 2G. While 3G currently claims to provide 2 Mbps in the 
measured regions, we observed that most base-stations provide 
much lower throughput during every day mobility scenario. 
Such achievable throughput cannot cater to the demanding 
applications, such as video streaming, that mobile users often 
desire to use during everyday long commutes 0 

For example, around 50% of base-stations that we sam¬ 
pled cannot cater to the recommended bit-rate for a medium 
sized YouTube stream (320p video), which requires 750 kbps. 
Recommended bit-rates for YouTube videos of other sizes are 
shown by the dotted red line in Fig. [T] 

We now analyze the variations of th roughputs across base- 
stations on user trajectories. Fig. |2(a)| demonstrates the fluc¬ 
tuations in throughputs as the user is traveling on RT-2 on 
various days. Each row plots one trajectory, with the dots 
representing the median throughput achieved. The radius of 
the dot is proportional to the throughput, and the dots are 
color-coded at 4 t hresh olds corresponding to different YouTube 
size videos. Fig. |2(a)| shows that mobile users often receive 
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(a) Throughputs on different trajectories in RT-2 



Fig. 2. Mobile users constantly experience low throughput on trajectories 


low capacity for extended durations of time. On RT-2, more 
than 47% of the throughput sampled on the trajectories are 
below 400 kbps, which is the recommended bit rate for a lower 
quality (240p) YouTube video. 

We construct a base station graph (BS-graph) for each 
trajectory, and demonstrate the variance of throughput. Each 
node in a BS-graph is a base-station. Ideally, a directed edge 
is drawn between two base-stations if the user hands-off from 
one to the other. However, due to our limited sampling, an edge 
denotes that the user was connected to the two base-stations in 
consecutive samples. The radius of the node is proportional to 
the median throughput observed when connected to this base- 
station. 

Fig. 2(b)| shows the sequence of unique base-stations that 
the user had recorded on the trajectories. The lower half of 
Fig 2(b) shows the BS-graph for all trajectories on RT-2, 
and the upper part of the figure separates the BS-graph for 
individual trajectories. We use this notion of a chain of base- 
stations, with the user having some experience on each base- 
station, to theor eticall y abstra ct the mobile user experience on 
a trajectory. Fig. |2(a) and Fig. 2(b) show that the user has long 
stretches of time (and sequence base-stations) that degrade 
user experience. 


III. Trajectory Utility Maximization Problem 

In this section, we design a generic formulation that, given a 
budget, plans base-station upgrades that maximizes the mobile 
users’ experience. Our model is practical and useful for today’s 
cellular operators as it utilizes the already existing data and 
enables the operator to control key parameters for base-station 
upgrades. Specifically, we allow the operator to: (1) optimize 
based on a given budget; (2) define the strictness of when a 
subscriber’s mobile experience is considered to be poor; (3) 


utilize active data already stored by many operators, such as 
call records 0 and deep-packet inspection logs |9], to quantify 
user experience on a trajectory. 

Consider the network B = {Bi, ..., B n } of n base-stations 
spread across a region. A trajectory Tj is represented as a 
sequence of tuples of the form 47 = (Bi, rji) that captures 
the user experience. The user on this trajectory was connected 
to the base-station Bi E B for a time interval of A i units 
and received a throughput of rji bytes per unit of time. Note 
that rji can be any metric as long as a greater rp denotes 
better experience (e.g., throughput or packet success rate) when 
associated to a respective base-station. Henceforth, for brevity, 
we refer to this metric as throughput. As we show in Section [V| 
the trajectories and 4>^’s can be constructed by scanning the 
active transaction records maintained by the operator. 

For ease of notation, we write Bi E Tj if the base-station 
Bi E B lies on the trajectory Tj. The length of a trajectory Tj , 
denoted by \Tj\, is simply the count of base-stations that lie 
on it. Suppose d denotes the maximum length of a trajectory. 

For a trajectory Tj, a base-station Bi E Tj is a bottleneck 
base-station if it offers a degraded quality of service, e.g., an 
extremely low upload/download speed, a call-drop, etc. In our 
model, we assume that a base-station Bi acts as a bottleneck 
w.r.t. a trajectory Tj if the corresponding throughput is less 
than a threshold, i.e., rji < r. The value of r is computed 
from a combination of network parameters. A base-station that 
is a bottleneck for one trajectory may not be a bottleneck for 
other trajectories since different users may experience different 
throughputs based on various factors such as data plan, time 
of the day, etc. 

Our goal is to maximally improve the mobile user ex¬ 
perience by selectively upgrading k out of n base-stations 
that act as bottlenecks on some trajectories. Suppose X = 
{xi,... ,x n } denotes the boolean solution vector such that 
Xi = 1 if and only if base-station Bi is chosen for upgradation 
and 0 otherwise. 

Our proposed framework allows the network operator to 
specify any trajectory utility function Wj : (Tj, X) [0? !]’ 
defined over each trajectory Tj and solution X, that captures 
the impact of base-station upgrades on the trajectory Tj. We 
assume that Wj increases as more number of bottleneck base- 
stations on the trajectory Tj get upgraded. In other words, the 
trajectory utility increases with its quality of experience (QoE). 
Our aim is to maximize the number of trajectories with high 
utility. 

To do so, given any trajectory utility function Wj, we map 
it to a step utility function Uj using a threshold 7 (0 < 7 < 1), 
henceforth referred to as the bottleneck parameter. 

u ‘={; <■> 

We now formally state the Trajectory Utility Maximization 
Problem, TUMP (7). 

Problem 1 (TUMP( 7 )). Given a base-station network B of 
size n, a budget parameter k, a bottleneck parameter 7, and 
a set of m trajectories T = {Ti,..., T m }, each of which has 
an associated utility function Wj, determine the set ofk base- 
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Fig. 3. Solution Overview 

stations to upgrade such that the sum of utilities eT * s 
maximized, where Uj is given by Eq. 0. 

Intuitively, a solution to a given instance of TUMP ( 7 ) with 
a high value of 7 would benefit lesser number of spatially 
distinct trajectories, as it attempts to utilize the available 
resources (i.e., upgraded base-stations) to optimize the QoE 
on these trajectories. On the other hand, a solution to the 
same instance of TUMP ( 7 ) with a lower value of 7 would 
attempt to be more fair in distribution of the available resources 
across a larger set of spatially distinct trajectories at the cost 
of allowing limited improvement in QoE. The operator can 
judiciously tune 7 based on the budget and desired subscriber 
experience. 


A. Solution Overview 

Fig. [3] outlines the basic steps involved in our solution 
framework. We construct the user trajectories from call records 
that the operator has already stored ®, 0 . Given a set of 
trajectories, we first identify the bottleneck base-stations, based 
on the throughputs received. A base-station is a candidate for 
upgrade if it is a bottleneck for any trajectory. The network 
operator provides a trajectory utility function Wj associated 
with each trajectory Tj , and the bottleneck parameter 7 . To 
maximize the number of satisfied trajectories, we map the 
utility function Wj to a step utility function Uj , using the 
bottleneck parameter 7 . Finally, we apply any of the proposed 
algorithms for TUMP( 7 ) (described in SectionlfVl), and report 
the k base-stations to be upgraded. 


B. Bottleneck Utility Function 

Though the proposed framework allows any trajectory utility 
function, for the purpose of analysis and evaluation of our 
approach, this section introduces a special trajectory utility 
function, namely bottleneck utility function. 

Given a trajectory Tj , we define the weight Wji for each 
base-station Bi G Tj , that accounts for the fraction of the total 
time that the user (on this trajectory) was connected to the 
base-station B^. More precisely, Wji = ^ A . Suppose 

bji denotes a bottleneck indicator variable that takes value 1 
if the base-station Bi G Tj is a bottleneck base-station w.r.t. 
the trajectory Tj , and 0 otherwise. 


'Two or more trajectories are spatially distinct if their corresponding set of 
base-stations are “largely” different. 


Given a trajectory Tj and solution X, we define the bottle¬ 
neck utility function Wj as follows: 

W j = T w 3i + T w H- x i 

j )bji— 0 Bi GT 'j ,bji=l 

Wj essentially captures the fraction of the total time when 
the user enjoys acceptable QoE on the trajectory Tj. If all 
the base-stations on Tj are non-bottleneck, then Wj = 1; 
otherwise, Wj < 1. Henceforth, we consider the bottleneck 
utility function as the default trajectory utility function. 

Based on this, we next define a class of trajectories that 
enjoy satisfactory QoE after upgradation of the base-stations. 

Definition 1 ( 7 -bottleneck-free trajectory). A trajectory Tj G 
T is 7 -bottleneck-free if its utility Wj > 7 where Wj is given 
by Eq. 0 . and 7 G [0,1] is the bottleneck parameter. 

Of particular analytical interest is the problem instance 
TUMP (7 =1), denoted henceforth by simply TUMP(l). In 
this case, Vj = 1 ,.. . ,ra, Uj = Wj = 1 if and only if all 
the bottleneck base-stations on the trajectory Tj are upgraded. 
This problem instance is interesting because it is the worst case 
instance of the TUMP ( 7 ) problem that aims to maximize the 
number of trajectories that are completely bottleneck-free. 

This framework allows the network operator to suitably 
select the bottleneck parameter 7 based on the application 
requirements. For example, 7 = 1 is suitable for real-time 
applications such as voice calls or video conferences whereas 
7 = 0.8 may suffice for video streaming since video play¬ 
ers can mask-off certain durations of low connectivity by 
buffering. Similarly, even 7 = 0.5 may be enough for elastic 
applications such as background synchronization of emails. 


C. Hardness of TUMP ( 7 ) 

Next, we show that TUMP ( 7 ) is NP-hard due to a reduction 
from the fc-Vertex Cover (fc-VC) problem p0| . 

Problem 2 (fc-VC). Given an undirected graph G(V,E) where 
V is the set of n nodes and E is the set of m edges, determine 
a set S C V of k nodes that maximizes the number of edges 
covered by the nodes in S, i.e., incident on at least one of the 
nodes in S. 

Being a generalization of the vertex cover problem, the fc- 
VC problem is NP-hard [ 10]. 

Theorem 1. The TUMP ( 7 ) problem is NP-hard. 

Proof: Given an instance of the fc-VC problem, G(V,E ), 
we reduce it to an instance of TUMP( 7 ) as follows. For 
each node Vi G V, we create a base-station Bi G B. For 
each edge ( v^Vj ) G E , we create a trajectory of size 2, 
{(Bi, A 0 , 77 o), {Bj, A 0 ,r/ 0 )}, where A 0 and rj 0 are arbitrary 
constant values for time interval and throughput respectively. 
Also, VTj,VBi G Tj,bji = 1, i.e., each of the base-stations 
incident on a trajectory acts as a bottleneck. 

If a subset S C V is a solution to the fc-VC problem, then 
the set of base-stations Bs = {Bfvi G S} is a solution to the 
TUMP( 7 ) problem with 7 = 0.5. If the edge ( Vi,Vj ) G E 





























5 




Trajectories 

T1=<B1,B4,B7> 

T2=<B2,B4,B6> 

T3=<B3,B4,B5> 

T4=<B8,B9,B11> 

T5=<B8,B9,B11 > 
T6=<B9,B12,B13> 
T7=<B9,B12,B13> 

T8=<B12,B10,B14> 
T9=<B12,B10,B14> 
T10=<B10,B11,B15> 
T11=<B10,B11,B15> 


Fig. 4. Trajectories in Example [I] 


is covered by the set S, then at least one of the two base- 
stations, Bi,Bj G Bs and, thus, the trajectory {(£?*, Ao,7?o), 
(Bj, Aq, 770 )} becomes 0.5-bottleneck-free, i.e., its utility be¬ 
comes 1. Since the subset S maximizes the number of edges 
covered by it, the solution Bs maximizes the number of 0.5- 
bottleneck-free trajectories or, in other words, maximizes the 
sum of trajectory utilities. 

Similarly, we argue that if B is a solution to the TUMP (7) 
problem with 7 = 0.5, then Sb = {vi\Bi G B} is a 
solution to the fc-VC problem. Since 7 = 0.5, the utility of 
a trajectory is maximized if and only if the trajectory becomes 
0.5-bottleneck-free, i.e., at least one of the two base-stations, 
Bi,Bj on the trajectory, is in the solution B. This in turn 
implies that the edge ( 77 ,Vj) G E is covered by at least one 
of the nodes v- L . Vj G S. Since B maximizes the number of 
0.5-bottleneck-free trajectories, the subset Sb maximizes the 
number of edges covered by it. 

Since the reduction requires space and time which is poly¬ 
nomial in the size of the input, the proof follows. ■ 

IV. Algorithms for TUMP( 7 ) 

This section describes the algorithms for the TUMP (7) 
problem. Firstly, we propose an integer programming based 
optimal algorithm, namely £P TU mp( 7 ) and an approximation 
scheme based on LP relaxation of JP TU mp( 7 )- Owing to their 
high running times, both of these algorithms are however, 
impractical for any reasonably sized dataset. The problem 
being NP-hard, we next present few approximation algorithms 
based on greedy paradigm, that not only offer faster running 
times, but also perform well in practice. 

We begin by illustrating a simple example of TUMP (7) 
problem, that will be shortly evaluated by different algorithms. 

Example 1 . Fig. [4] shows 11 trajectories, Ti,--* , Tn, each 
of length 3, passing through a set of 15 base-stations 
£> 1 , • • • , £> 15 . We assume that each base-station is a bottleneck 
w.r.t. each of the trajectories incident on it. In addition, for 
ease of analysis, we assume that for a given trajectory, the A, 77 
values are same for all the base-stations incident on it. For this 
reason, we did not list their values in Fig. [4] This assumption 
eliminates the influence of Wji s on the solution, hus, for any 
trajectory Tj and a base-station Bj G Tj, Wji = 1/3. We 
next observe that while each of the base-stations £?g, £? 10 , Bn 
and B \2 have a maximum of 4 incident trajectories, the base- 
stations £>4 has 3 incident trajectories, the base-stations £>g, 
£?i 3 , £?i 4 and £?i 5 have 2 incident trajectories and the rest 


have only 1 trajectory passing through them, e set k = 3 base- 
stations to upgrade as the budget parameter. We evaluate this 
example for three different values of 7 , as shown in Table [IT] 
Following the above assumptions, we find that when 7 = 0.33 
(resp. 7 = 0.5 and 7 = 1), it implies that at least one (resp. two 
and three) of the three base-stations on the trajectory must be 
upgraded, in order to make it 7 -bottleneck-free. The optimal 
solutions for each of these cases are shown in the table. □ 

We pose the TUMP ( 7 ) problem in a graph setting. Each 
instance of the TUMP ( 7 ) problem is associated with a hyper¬ 
graph H = (V, E ) where V = { 77 ,..., v n } is the set of n 
nodes corresponding to the set of base-stations, £>, and E = 
{ei,..., e m } is the set of m hyper-edges corresponding to the 
set of trajectories, T. A node 77 represents a base-station Bi 
and a hyper-edge e 3 represents the set of base-stations that 
the trajectory Tj passes through, i.e., e 3 m {vi\Bi G Tj}. The 
degree of a node is the number of hyper-edges incident on it. 

Given a set of nodes S C V, its weight w(S) denotes the 
number of hyper-edges, e 7 , incident on at least one of the 
nodes in S such that Tj is 7 -bottleneck-free with respect to 
the nodes in S. For 7 = 1 , w(S) denotes the number of hyper¬ 
edges induced on the sub-hyper-graph formed by S, i.e., all 
the nodes of the hyper-edge are contained within S. 

Referring to this hyper-graph model, henceforth, we shall 
use the terms node and base-station (and respectively, hyper¬ 
edge and trajectory) interchangeably. The example in Fig. [4] 
shows this hyper-graph setting. 

As a pre-processing step to all our algorithms, we discard 
the trajectories that cannot be made 7 -bottleneck-free by 
upgrading any set of k base-stations. If the sum of weights 
of the k bottleneck base-stations with the largest weights is 
less than 7 , the trajectory can be deemed as infeasible and can 
be pruned. 


A. Optimal Algorithm 

The optimal solution to the TUMP ( 7 ) problem is rep¬ 
resented by the following integer programming formulation, 
denoted by ££tump( 7 ) : 

m 

max w 1 = 

3 =1 

n 

S.t. ^2, Xi — 

i=1 

Vi = l,...,n, 

and Vj = 1,..., m, 

where W 3 is given by Eq. [2j assuming it to be the bot¬ 
tleneck utility function. The solution {£, U} of the above IP 
formulation yields the optimal weight w 1 (i.e., the number 
of 7 -bottleneck-free trajectories) to the TUMP( 7 ) problem. 
Since the utility of each trajectory is at most 1, necessarily, 
w 7 < m, where m is the total number of trajectories. Since 
obtaining the optimal solution requires time that is exponential 
in the input size, it is impractical. Approximation algorithms 


x u Uj €= { 0 , 1 }, 

u 0 <^ 

7 
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are, thus, required to solve the problem in practical running the baseline algorithm for TUMP ( 7 ), and compare the 
times. performance of other algorithms against it, as discussed in 

detail in Section IVTl 


B. Approximation Algorithms 

In the following sections, we present few approximation 
algorithms for the TUMP (7) problem. 

Let A be any approximation algorithm for TUMP (7) that 
always returns a feasible solution S with weight w(S) > 
r.w(OPT ), for some fixed r E [0,1], where OPT refers to 
an optimal solution . Then r is said to be the approximation 
bound of the algorithm A. Moreover, if an algorithm has an 
approximation bound of r for the TUMP(l) problem, then 
r acts as an approximation bound for any TUMP (7 < 1) 
as well, because TUMP(l) is the worst case instance of 
TUMP( 7 ). 

C. LP 

Here, we propose a linear programming (LP) based heuristic 
for the TUMP (7) problem. The LP solution is based on 
the LP relaxation of the integer programming formulation 
LPtump( 7 ) specified for the optimal solution. This relaxation 
allows the variables X{,Uj to take fractional values, i.e., 
Vi, Vj, 0<Xi,Uj<l. Assume that S* = {x*, U*} denotes 
the (optimal) solution to the above linear program. We derive 
an integer approximate solution S = { 27 , Uj} from S* as 
follows. We pick the highest k values from (#*,...,#*) and 
set the corresponding 27 to 1 . 

The expensive step of this approach is solving the linear 
program that involves 0(m + n) constraints and 0(m + n) 
variables. The running time of this approach, is although 
considerably less than the optimal algorithm based on integer 
programming (JP TUMP ( 7 )), but still impractical for any rea¬ 
sonably sized dataset, as shown in the experiments, discussed 
in Section [VI| Therefore, we next, propose a set of approx¬ 
imation algorithms based on greedy paradigm, that not only 
offer faster running times, but also perform well in practice. 

D. SIMPLE-GREEDY 

For each base-station E>i E B , we define its bottleneck- 
weight, c Oi = {Y^JLi w ji\bji = 1}. Typically, a base-station 
that acts as a bottleneck for a large number of trajectories 
for a considerable fraction of their total time will have a 
high bottleneck-weight, and is thus, a good candidate for 
upgradation. The simple greedy approach picks the k base- 
stations having the largest bottleneck-weights . 

Example 2. Consider Example [I] Since each of the base- 
stations £ 9 ,..., £12 have maximal bottleneck-weight 4/3, 
this approach will select the base-stations Pio,Pn and B 12 
(breaking ties on higher index of base-station), irrespective of 
the value of 7. The utilities, thus obtained for different values 
of 7, are listed in Table [II] □ 

The primary drawback of this approach is that it is 
independent of the notion of trajectory utilities. This is 
why it does not perform well, especially when 7 is high. 
Nevertheless, owing to its simplicity, we consider it as 


Analysis of SIMPLE-GREEDY: First, we analyze the time 
and space complexity of SIMPLE-GREEDY, and then analyze 
its approximation bound. 

Theorem 2. The time and space complexity bounds for 
SIMPLE-GREEDY are 0(md-\-n log k) and 0(md) respec¬ 
tively, where m is the total number of trajectories, d is the 
maximum length of any trajectory, n is the total number of 
base-stations, and k is the budget parameter. 

Proof: First we analyze the time complexity. Given that 
there are m trajectories, and each of them has length at 
most d , scanning the input and computing the bottleneck- 
weight of each base-station, takes 0(md ) time. Further, com¬ 
putation of top -k base-stations w.r.t. their bottleneck-weights 
takes 0(n log fc) time m where n is the total number of 
base-stations and k is the number of base-stations to be 
upgraded. Thus, the time complexity of SIMPLE-GREEDY 
is 0(md + nlog k). 

As we analyze the space complexity, assuming each tra¬ 
jectory is stored as a list of tuples taking 0(d) space, the 
total input takes 0(md ) space. We require 0(n) space for 
computation of the bottleneck-weights ( 0 ( 1 ) space for each 
node). Since md > in, the space complexity, is thus 0(md). 


Unfortunately, this algorithm has no approximation bound 


for the TUMP ( 7 ) problem, as shown in Table IV 


Theorem 3. SIMPLE-GREEDY has no bounded approxima¬ 
tion for TUMP ( 7 ). 


Proof: To show that SIMPLE-GREEDY does not offer 
any approximation bound, we consider the following instance 
of TUMP( 7 ) problem. Let B = {Bi ,..., B n } be the set of 
base-stations, and T = {Ti,..., T m } be the set of trajectories 
such that m = |_§J + 1. For j = 1,..., m — 1, the trajectory 
Tj of length 2 , passes through the base-stations Bj and By 
where j' = j + |_§J. The trajectory T m of length |_§J, passes 
through the base-stations Bi ,..., J • We assume that each 
base-station is a bottleneck for each of the trajectories passing 
through it. Additionally, for any trajectory, we assume that 
the A, r] values are same across all the base-stations that the 
trajectory passes through. Therefore, we note that for any 
trajectory Tj, j = 1,..., m — 1, that passes through the base- 
stations Bj and By, Wjj = Wjj> =0.5. For trajectory T m , 
for i = 1 ,..., |_§J, we note that w m i = 1/|_§J. Therefore the 
bottleneck weight uji for base-station Bi is given by: 


f 0.5 + l/|_fj for i = 1 , ..., [f J 
\0.5 for i = |_f J + 1 ,.. . ,n 


(3) 


Thus, for any 2 < k < |_f J, SIMPLE-GREEDY would select 
any k base-stations from the set Bi ,..., 5 ^ ue t0 their 

higher bottleneck weight. But any such selection cannot make 
any trajectory 7 -bottleneck-free for 7 = 1. On the other hand, 
an optimal solution would make at least one trajectory 7 - 
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TABLE II. Utilities derived from different algorithms for Example[T]with k = 3. 


Algorithms 

7 = 0.33 

7 = 0.5 

7 — 1 


Upgrades 

Utility 

Upgrades 

Utility 

Upgrades 

Utility 

Optimal 

B4, Bn, B\2 

11 

-Bio, Bn , B 12 

4 

Bio, B12, B 14 

2 

SIMPLE-GREEDY 

B 10 , B n , B 12 

8 

B 10, B 11 , B 12 

4 

Bio, Bn, B i2 

0 

INC-GREEDY 

Ba, B 11, B\2 

11 

Bio, Bn, B12 

4 

Bio, Bn, B12 

0 

DEC-GREEDY 

Ba , B n, B\2 

11 

Bio, Bn, B12 

4 

Bio, B12, B14 

2 


bottleneck-free. Thus, we show that SIMPLE-GREEDY has 
no approximation bound for TUMP ( 7 ). ■ 

The time and space complexity bounds of the proposed algo¬ 
rithms are stated in Table [IV| The proofs are provided in fT2) . 
Further, we show that this algorithm has no approximation 
bound for the TUMP ( 7 ) problem, [12]. 

E. INC-GREEDY 

Based on the principle of maximizing marginal gain , this 
approach starts with an empty set of nodes So = 0 , and 
incrementally adds nodes such that each successive addition of 
a node produces the maximal marginal gain in the weight of the 
solution. The algorithm proceeds in iterations 0 = {1,. 

In the beginning of iteration 6 , suppose the existing solution is 
the set of nodes S 0-1 with weight w(So- 1 ). The node vq from 
the remaining set V\Sq -1 is added such that w{Sq-i U {u#}) 
is maximal. The new set is referred to as Sq. 

In any iteration, if multiple candidate nodes have the same 
maximal marginal utility, we select the one with the largest 
bottleneck-weight. Still, if ties remain, then without loss of 
generality, we break the tie by selecting the node with the 
highest index (the indices are arbitrary but unique). 

Example 3. We first evaluate INC-GREEDY on Example [I] 
for 7 = 0.33. At iteration 1, nodes B 9 ,..., B 12 have the same 
maximal marginal utility of 4 and the same bottleneck-weight 
4/3. So, we pick B 12 as it has the highest index. Next, we 
select Bu , as it offers the maximal marginal utility of 4. The 
base-station .Bio does not offer the maximal marginal gain any 
more. The base-station B 4 with marginal gain of 3 becomes 
the best choice next. Thus, we obtain a net utility of 11, which 
equals the optimal utility. 

The selections and utilities for other values of 7 case, are 
shown in Table [II] Noticeably, for 7 = 1 , this algorithms offers 
0 utility. Evaluating this case, in iteration 1, we find that all 
the base-stations offer 0 marginal utility, and so we sample the 
base-stations ..., B 12 on the basis of maximal bottleneck- 
weight. Eventually, we pick B 12 owing to its highest index. In 
the following 2 iterations, once again, we find that the marginal 
utility offered by any base-station is 0. Respecting the tie¬ 
breaking criteria, we pick Bn and B>io in respective order. This 
strategy, however, does not make any trajectory 7 -bottleneck- 
free, and yields 0 utility. □ 

The next algorithm (DEC-GREEDY) addresses this limita¬ 
tion. 

Analysis of INC-GREEDY: Fist we analyze the complexity 
bounds, followed by the approximation bound. 

Theorem 4. The time and space complexity bounds for INC- 
GREEDY are 0{md 2 ) and 0{md ) respectively. 


Proof: We first analyze the time complexity of the al¬ 
gorithm. As a pre-processing step, the algorithm computes 
the bottleneck-weight and initial marginal utility for each 
node, and computes the initial trajectory utility Wj for each 
trajectory. This step takes 0(md) time. Next, in any iteration 
6, we select the node vq that has maximal marginal utility 
(applying the tie-breaking rules, if necessary), and add it to 
the set Sq. This step takes 0(n) time. Then for each trajectory 
Tj incident on vq , for each 17 G V \ Sq such that Bi G Tj, 
and bji = 1 , we check if adding 17 to So would make Tj 7 - 
bottleneck-free. If so, we increment the current marginal utility 
of vi by 1. If Sq is the degree of the node vq , then this step 
takes O(Sod) time. Since Y^=i hence the total time 

spent in the above step over k iterations, is 0(md 2 ). The total 
time complexity of the algorithm, running over k iterations, is 
thus 0 (md 2 + kn ). 

To analyze the space complexity, we find that besides the 
input that takes 0 (md) space, the algorithm requires 0(1) 
space for each of the trajectories to store their current utilities 
that gets updated after each iteration. Further, we need 0(1) 
space for each of the nodes to store their marginal utility, 
bottleneck-weight, and to hold the information if they are 
selected. Thus, the space complexity of the algorithm is 
0 (md + m + n) = 0 {md). ■ 

The time complexity of this algorithm can be further im¬ 
proved to 0 (md 2 + k log n) by using advanced data structures 
such as Fibonacci Heaps (B) to store the marginal utilities 
of the nodes. However since the total time for updating 
of marginal utilities of the nodes (0 (md 2 )) dominates the 
time for computing the node with maximal marginal gain 
00 (k\ogn )), we avoided this implementation. 

Unfortunately, similar to SIMPLE-GREEDY, INC- 
GREEDY has no approximation bound for TUMP ( 7 ). 

Theorem 5. INC-GREEDY has no bounded approximation for 
TUMP( 7 ). 

Proof: To show that INC-GREEDY does not offer any 
approximation bound, we consider the following instance of 
TUMP (7) problem. Let B = {Bi,..., B n } be the set of 
base-stations, and T = {Ti,T 2 } be the set of trajectories. 
Without loss of generality, let n be a even number. Let 
trajectory T\ pass through the base-stations with odd index, 
i.e., Bn # 3 , T> 5 , • • •, B n _ 1 , and let trajectory T 2 pass through 
the base-stations with even index, i.e., B? 2 , B^, Bq, ..., B n . We 
assume that each base-station is a bottleneck for each of the 
trajectories passing through it. Additionally, for any trajectory, 
we assume that the A, 7 values are same across all the base- 
stations that the trajectory passes through. Since there is only a 
single trajectory (of length nj 2 ) incident on each base-station, 


















the bottleneck weight uji, for any base-station Bi G B, is 2/n. 
Let k = n/2 and 7 = 1 . 

Now let us evaluate INC-GREEDY on this instance. In 
each iteration, 1 , ...,&, the marginal utility of each node is 0 . 
Respecting the tie-breaking criteria, in iteration 1,..., k, we 
select the nodes B n .B n -\ ,..., respectively. How¬ 

ever, by this selection, neither of the trajectories could become 
7 -bottleneck-free. On the other hand, an optimal solution 
would select all the n/2 base-stations on either of trajectories, 
Ti or T 2 , thus making at least one trajectory 7 -bottleneck- 
free. Thus, we show that INC-GREEDY has no approximation 
bound for TUMP (7). ■ 

The time and space complexity bounds of this algorithm are 
stated in Table [TV] The proofs are provided in fl2| . Further, 
we show that this algorithm has no approximation bound for 
the TUMP (7) problem, p2). 


F. DEC-GREEDY 

This algorithm operates in the reverse order by minimizing 
marginal loss. It starts with the full set of nodes V and 
successively removes nodes in a manner that minimizes the 
marginal loss in the weight of the resulting set. More precisely, 
it starts with So = V, and removes one node in each iteration 
6 = {1,..., n — k}. At the start of iteration 6 , suppose the 
existing set of nodes is So- 1 with weight w(Sq~ 1 ). From 
this set, the node vq is removed such that w(Sq~ 1 \ {vo}) 
is maximal. The new set is referred to as S$. 

Moreover, after each iteration, all trajectories that can no 
longer be made 7 -bottleneck-free are pruned. In any given 
iteration, if multiple candidate nodes qualify to be deleted, then 
the one with the smallest bottleneck-weight is chosen. Still, if 
there are multiple candidates, the tie is broken by removing 
the node with the lowest index. 


Example 4. We first evaluate DEC-GREEDY on Exam¬ 
ple [T] for 7 = 1. At iteration 1, any of the base-stations, 
B 3 , T> 5 , Bq and B 7 , have the same minimal marginal 
loss in utility of 1. Respecting the tie-breaking criteria, we 
prune B\. Consequently, trajectory T\ becomes infeasible and 
is, thus, pruned. This results in marginal utility of B 7 to 
become 0, as there is no other incident trajectory on B 7 . So, 
in the next iteration, we prune B 7 . Proceeding in this manner, 
the order of deletions of the nodes in subsequent iterations 
is as shown in Table [Till At the end of 12 iterations, we are 
left with the base-stations T?io,Bi 2 and B \4 which render 2 
trajectories, T 8 and Tg, 7 -bottleneck-free, which is same as the 
optimal algorithm. 

Table |II| shows the output of DEC-GREEDY for other values 
of 7 , along with Table III which shows the order of deletions 
of nodes in the n — k = 12 iterations. □ 


Analysis of DEC-GREEDY: The time complexity analysis 
of DEC-GREEDY is similar to that of INC-GREEDY, with 
the difference that in this case there are n — k iterations. 
Thus the time complexity is 0(md 2 + (n — k)n), as shown in 
Table [Tv| Since, k is likely to be less that n/2, DEC-GREEDY 
is expected to take more number of iterations, and is therefore, 


TABLE III. Order of node-deletions by DEC-GREEDY for 
Example^] 


7 

Order of Nodes deleted during iteration 1 to 12 —>• 

0.33 

B i, -B2, B3, E?5, Bq , By, Bg, B 13, -B14, -B15? Bq , Bio 

0.5 

B 1, B2, Bg , Bg, B 13, B14, B 15, Bg, Bq, B4, By, Bq 

1 

B\, By, B2, Bg, Bg, B4, Bg, Bg, Bq, Big, Bn, Big 


more expensive in terms of time, when compared to INC- 
GREEDY. The space complexity of DEC-GREEDY is same 
as in the case of INC-GREEDY, i.e., 0(md). 

We next analyze the approximation bound of DEC- 
GREEDY algorithm. 

Theorem 6 . The approximation bound of DEC-GREEDY for 
TUMP( 7 ) is ( k d )/( n d ). 

Proof: To analyze the worst case scenario, we assume 
that each base-station is a bottleneck w.r.t. each of the incident 
trajectories, and further, 7 = 1 . Assume that Sq denotes the 
set of nodes in the sub-hyper-graph resulting at the end of 
iteration 6 = { 1 ,..., n — k}. Since 7 = 1 , w(Sq) denotes the 
number of hyper-edges induced in this sub-hyper-graph, i.e., 
all its nodes must be in Sq. Following the above assumption, 
once the node vq is removed, all its incident hyper-edges in 
Sq _i are removed, because the corresponding trajectories can 
no longer become 7 -bottleneck-free. Therefore, the node vq 
(selected to be pruned due to its minimal marginal loss) must 
have the minimal degree in the sub-hyper-graph induced over 
the nodes in Sq-\. This ensures that the weight of the resulting 
set w(Sq) = w(Sq- 1 \ {vq}) is maximal. 

Note that the sum of degrees of nodes in So- 1 is equal to 
the sum of the lengths of the trajectories induced in the sub¬ 
hyper-graph So -1 which is at most w(So-i)d where d is the 
maximum length of any trajectory. Since |S 0 _i| = n — 6 + 1, 

the average degree of a node in So -1 is at most d = 

. After DEC-GREEDY removes the node vq with 
minimal degree (which is at most the average degree), the 
weight of the sub-hyper-graph So is bounded as 

w (So) > w (So- 1) - w (So- 1) rc _ g + 1 

Hence, after n — k iterations 

«KS„-*)>t0(So)n ( 1- n-g + l ) 

0=1 v 7 


For TUMP(l), we can assume that k > d since any 
trajectory with \Tj\ > k can be pruned as it can never be made 
7 -bottleneck-free. Hence, using n> k > d and w(Sq) = m, 


w (S n -k) > m 


fc(fc-l)...(fc-d+l) 
n(n — 1 ) ... (n — d + 1 ) 


— m 



Since weight of any optimal solution is at most m, the 
approximation bound of DEC-GREEDY is (^) / (™). ■ 


G. Equivalence of Incremental and One Shot Upgrades 

An interesting and very useful property of the proposed 
INC-GREEDY and DEC-GREEDY algorithms is that they 
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naturally support incremental upgrades of base-stations. Note 
that INC-GREEDY (respectively, DEC-GREEDY) selects (re¬ 
spectively, prunes) one base-station in each iteration, and 
the selection criteria is independent of the budget parameter 
k. Hence, it can be shown that, for both these algorithms, 
piecewise incremental upgrades is equivalent to a one-shot 
upgrade, provided the total budget is same in both the cases. 

More formally, if k\ + &2 + • • • = k (Vfc$ > 0), then 
successive applications of INC-GREEDY (respectively DEC- 
GREEDY) with budget parameters fci, followed by k^ etc., 
would upgrade the same set of base-stations as a one-shot 
application of INC-GREEDY (respectively, DEC-GREEDY) 
would with budget parameter k. This property is very impor¬ 
tant as it allows the network planners to upgrade as and when 
some budget is allocated. The overall effect on the network is 
the same even if the entire budget was made available at one 
go. First we establish this claim for INC-GREEDY, followed 
by DEC-GREEDY. 

Proposition 1. Given any instance of TUMP ( 7 ), 

S k = S e U S' k _ e , \/l<k<n,l<0<k (4) 

where S k is the set of nodes (corresponding to the upgraded 
base-stations) by INC-GREEDY (k), and Se, S' k _ e are the 
sets of nodes provided by two consecutive operations of INC- 
GREEDY (0) on V, and INC-GREEDY (k - 0) on V \ S e 
respectively. 

Proof: Observe that the algorithm always selects the 
nodes in V in a consistent order. This follows from the 
fact that the algorithm breaks all ties deterministically (if 
needed, using the index of the nodes). Therefore, without 
loss of generality, assume that INC-GREEDY (n) applied on 

the node set V selects the nodes in the order v\ ,_, v n . 

Hence, INC-GREEDY (fc) applied on the node set V would 
select the nodes S k = {vi, ..., v k }. For 0 < k, Sq = 
{v u ...,v e } C S k . Applying INC-GREEDY (k - 0) on 
the set V \ Sq = {vq+i, ..., v n } would select the nodes 
S’ k _ e = {v e+1 ,...,v k }. Hence, S k = S e US' k _ 0 . ■ 

Proposition 2. Given any instance of TUMP ( 7 ), 

S n -k = S n - e U S ' n _ k , VI < k < n, 1 < 0 < k (5) 

where S n - k is the set of nodes retained (corresponding to 
the upgraded base-stations) by applying DEC-GREEDY (k) 
on V , and S n -$, S' n _ k are the sets of nodes retained by two 
consecutive operations of DEC-GREEDY (0) on V, and DEC- 
GREEDY (k — 0) on V \S n -e respectively. 

Proof: Observe that the algorithm always prunes the 
nodes in V in a consistent order. This follows from the fact that 
the algorithm breaks all ties between the nodes deterministi¬ 
cally (if needed, using the index of nodes). Therefore, without 
loss of generality, assume that DEC-GREEDY (0) applied on 
the node set V prunes the nodes in the order vi,...,v n . Hence, 
applying DEC-GREEDY (fc) on V would prune the nodes 
Vi,v 2 ,..., v n - k and return S n - k = {v n - k+1 ,v n }. For 
0 < k, S n -e = {v n - 9 + 1 , C S n - k . Applying DEC- 

GREEDY (k — 9) on the set V \ S n -$ = {ui,..., v n _$} 


TABLE IV. Summary of Analysis of Algorithms for TUMP ( 7 ) 


Algorithms 

Time Complexity 

Space Complexity 

Approx. Bound 

SIMPLE-GREEDY 

0(md + n log k) 

O(md) 

0 

INC-GREEDY 

0(md 2 + kn ) 

O(md) 

0 

DEC-GREEDY 

0(md 2 + (n — k)n) 

0(md) 

(S)/(S)_ 



Fig. 5. CING City Model: Example of a virtual NYC 


would prune the nodes vi,...,v n - k and return S' n _ k 
fc+i? • • • 0 }* Hence, S n — k — S n —e U S n _ k . 


Summary of the Algorithms 

A summary of analysis of the proposed greedy algorithms 
is presented in Table [TV] While SIMPLE-GREEDY, and INC- 
GREEDY have lower time complexities than DEC-GREEDY, 
but they do not offer any bounded approximation. DEC- 
GREEDY, on the other hand, offers a bounded approximation 
with the trade-off of higher running time. 


V. Evaluation Methodology 

We show the efficacy of the algorithms using real as well 
as synthetic but realistic datasets. Our formulation considers 
two critical parameters whose values are best decided by the 
operator: the budget (fc), and the threshold 7 (which signifies 
the desired satisfaction level on the trajectory that the operator 
is targeting). The input trajectories are generally chosen by the 
cellular operator based on the target subset of subscribers (e.g., 
based on focused micro-segments such as long-commuting 
3G subscribers). The trajectories of subscribers are readily 
available to operators in Call Detail Records (CDRs) and deep- 
packet inspection logs |8|. 


A. City and Network Generator (CING) 

Real data about trajectories of subscribers is generally not 
publicly available. Hence, we designed City and Network 
Generator (CING) tool to generate representative traces of 
population distribution, mobility and network topology. We 
evaluate our protocols under such synthetic data and, also, 
on one real trace data set (E); we describe these data sets 
later. CING models both the city population and network 
deployment by considering: (1) how users travel in a city, 
and (2) how base-stations (BS) are deployed. We now briefly 
explain these aspects. 

City Model: 

We model a virtual city based on the existing map-data 
and the observed spatial distribution of homes and offices. 
We employ a concentric city model, where the city is divided 
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into a few layers G3- The Central Business District (CBD) 
constitutes the city-center, and usually has high office density. 
The Urban Envelopes (UE) are around the CBD. Edge Cities 
(EC) surrounds the main city’s UE. 

CING reads parameters such as dimensions and of¬ 
fice/residential densities in various layers of the city, and 
creates synthetic user homes and offices. Lang et. al. have 
reported these information for thirteen popular US cities G3- 
However, only some regions of the city has been surveyed. 
Since we are interested in movement of subscribers in a 
city, directly using the partial city information leads to biased 
trajectories. Hence, we extrapolate the parameters reported to 
the entire city dimension. City dimension is computed by 
latitude and longitude information the outer-most layer of 
the city that is surveyed (the EC layer). We then construct 
concentic rectangles of CBD, SD, UE and EC layers for each 
city as shown in Fig. [5] 

The areas of the layers are scaled proportionally to match 
the overall area. We distribute homes/offices in each layer 
according to the observed spatial densities G3 Using map 
data, we then prune the locations that are in inaccessible 
regions, such as locations within the rivers (see Fig. [5]). 

Currently, we associate home and office as User hangouts. 
A user hangout is represented by a spatial location and most 
frequent user’s arrival and departure times. We artificially 
populate the home and office hangouts by observing weekday 
commute patterns of the users.; for example, by observing 
that the user leaves home at a random time from 7 AM to 
11PM to commute to office. If the spatio-temporal hangout 
information is available (e.g., as a result of mining real profile 
data, social network data or CDR data), the data can hence also 
be provided as an input to CING tool rather than generating 
virtual locations. Network Model: CING generates base- 
stations (BS) such that the number of BS in a region is 
proportional to the number of homes and offices. Our network 
consists of 82% 2G BS, based on the deployment statistics 
of a major Indian cellular operator ©• We mark 20% of 
the BS as congested. Based on our measurement observations 
(Section III]), we randomly choose per-user throughput within 
[20,80], [50,150], [20,400] and [300, 2000] kbps for congested 
2G, non-congested 2G, congested 3G and non-congested 3G 
BS respectively. 

Trajectory Model: 

The trajectory properties such as the path taken from origin 
to destination affects the user’s experience. CING derives the 
road-network graph from the OSM map-data fl6| . It computes 
the trajectories of the user by simulating user movement on 
road between home to office (using the shortest path algo¬ 
rithm). For each road trajectory thus computed, we compute 
the sequence of base-stations by assuming a hand-off policy 
where a user is connected to the nearest base-station at any 
location. The time connected to a base-station based on the 
road-speeds and BS coverage. The throughput of the user is 
dependent upon the per-user throughput of the BS and the 
duration of association. 



Fig. 6. Distribution of trajectory length and degree of a base-station: The 
degree of the base-station is normalized to the number of trajectories present 
in the scenario. 


B. Data sets 

We evaluate the performance of the proposed algorithms 
using both real and synthetic data. 

a) Data set from real traces: The Reality Mining (RM) data 
set lists the base-station handoffs of more than 100 users Q4). 
Most of the users belong to MIT and, hence, the set is biased. 

We use this data set to study optimizing trajectories of 
targeted subscribers with similar spatial hangouts and interests. 
We extract the sequence of base-station IDs that a user con¬ 
nects. We break the sequence of base-stations into trajectories 
where the user might have moved; we break a trajectory when 
there is no change in base-station for more than 30 minutes. 
We prune the data to include mobile trajectories of users which 
have more than 10 base-stations. We also remove trajectories 
with sequence of base-stations showing a ping-pong handoff 
pattern (17]|. The data set has 3819 trajectories and 17975 base- 
stations. 

We first extract the trajectories by assuming that the a 
trajectory starts if the user is previously static for more than 
30 minutes. The data also contains noisy trajectories, primariy 
because of: (1) base-station IDs which are out of the city are 
recorded, and (2) handoffs that often cause ping-pong effect 
where the device swaps between a set of base-stations even 
when the user is not moving dD Such outlier base-stations 
and trajectories are pruned; we remove 1000 base-stations that 
have the least number of trajectories passing through them, and 
only consider trajectories that have a length of greater than 10. 

We finally get 3819 trajectories and 17975 base-stations. 
Note that there is no base-station location information in this 
data; however, ID suffices for our evaluation. 

RM has 3,819 trajectories and 17,975 base-stations. Fig. [6] 
shows the CDF of the trajectory length and the degree of a 
base-station (normalized to the percentage of trajectories that 
pass through a base-station). The data shows a highly skewed 
distribution, with many base-stations having very low degree 
(when compared to the city movement in pure synthetic city 
data). We conjecture that this is due to the biased set of users; 
many trajectories often visit a very few base-stations near MIT. 

b) Synthetic data sets: We generate three classes of datasets, 
as summarized in Table El 

1. Star and Mesh: These topologies have artificially generated 
distribution to highlight upgrades in extreme cases of popula¬ 
tion distribution. Star has a dense CBD with offices, and users 
commute from their home in a thin UE layer. Mesh indicates 
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TABLE V. Synthetic data sets: We also evaluate Mesh topology and real traces from Reality Mining (RM) data set. 




(a) Performance of DEC-GREEDY 
is close to optimal 



(b) Time comparison: Our algo¬ 
rithms are 3-4 orders faster 
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(a) Varying budget (k’); 7 = 1 . 



(b) Relaxing QoS (7); k' — 0.2 


Fig. 7. Comparison of different schemes w.r.t. Optimal 


Fig. 8. Effect of trajectory length (L), k' and 7: Our algorithms are well- 
suited for network upgrades when stricter QoS has to be delivered over longer 
trajectories with low budgets 


a large city where people are equally likely to move in any 
direction. Here, trajectories are randomly assigned to base- 
stations with trajectory length distribution similar to Star. 

2. New York City (NYC) and Atlanta: We generate two 
representative large US cities of differing population distri¬ 
butions G3- NYC has a large CBD with dense offices, while 
Atlanta’s CBD is small and relatively sparse. Hence, NYC 
dataset consists of more concentric trajectories, and Atlanta’s 
trajectories are spread out. 

3. Bangalore: We simulate an Indian city, where the population 
distribution is different than USA. Offices are concentrated in 
business areas, and homes are spread out across city. 

Fig. [ 6 ] shows the trajectory length and percent of trajec¬ 
tories incident on base-stations in different data sets. When 
compared to the RM data set, the synthetic set has higher 
degree of trajectories incident on each base-station since we 
consider people moving from all parts of the city towards their 
respective offices. 

VI. Results 

We measure the effectiveness of the algorithms by com¬ 
puting the percentage of trajectories that are 7 - bottleneck- 
free , which is an indication of the percentage of mobile users 
who will be satisfied by the upgrade. We first analyze small- 
scale regions in the city to understand the effect of model 
parameters, and then we show the improvements in large-scale 
city-wide scenarios. 

A. Comparison with the Optimal 

We extracted the trajectories in a miniature 2x2 km 2 area 
from Bangalore dataset. Since the problem is NP-hard, the 


optimal solution does not scale up for larger city networks. 
Therefore, we compared the performance of our algorithms 
against the optimal solution on this small dataset. This region 
had 1405 trajectories and 30 base-stations, out of which we 
chose to optimize 9 base-stations. Since each dataset has 
different number of base-stations, we analyze the results with 
the fraction of BS to be upgraded, k' = 7 . In the above 
scenario, k' = ^ = 0.3. 

We also compare with LP-Relaxation b ased a pproach (T8), 
which was discussed in Section [IV] Fig. |7(a)| compares the 
performance of the algorithms at different 7 ’s, and Fig. |7(b)| 
shows the run-time for 7 = 1. Our algorithms, especially 
DEC-GREEDY, provide similar performances as optimal and 
LP-Relaxation, while running 3-4 orders of magnitude faster. 
SIMPLE-GREEDY fails to provide good user experience at 
higher 7 . 

INC-GREEDY is also closer to optimal; however the selfish 
choices of the algorithm during initial iterations results in a 
lower performance. We explain the reasons in more detail later. 

B. Effect of budget allocation, QoS and trajectory lengths 

We choose a 5x5 km 2 area in Bangalore to demonstrate the 
detailed effects of basic parame ters; a larger scale city analysis 
is later discussed in Section |VI-C| This area has 23,000 
trajectories with 107 BS. There are 4240, 5420, 8470 and 
5430 trajectories with lengths between [1, 5],[ 6 ,10], [11,15] 
and [16,20], respectively. 

1. Effect of budget allocation: Operators incrementally 
allocate small budgets for upgrades. For example, k' < 0 .2 is 
a representative annual budget of a major Telco |5|. Fig. | 8 (a)| 
shows the effect of altering the budget ( k ') when the operator 
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(a) Reality Mining 


(b) Atlanta 


(a) Star-like vs. Mesh-like 


(b) Runtime on Atlanta data-set 


Fig. 9. Performance of TUMP (7 = 1) in different scenarios: DEC- 
GREEDY and INC-GREEDY consistently outperform SIMPLE-GREEDY. 
With an upgrade budget k' = 0.2, our algorithms perform 3-8 times better 
in improving the user quality of service on trajectories in different city 
geographies when compared to greedy location-based base-station upgrades. 


chooses to ensure strict QoS to the mobile user (7 = 1). 
DEC-GREEDY consistently outperforms SIMPLE-GREEDY 
across different budgets. Note that DEC-GREEDY can be 
repeatedly applied in increments as more budget becomes 
available. Hence, the operator can consistently provide better 
performance than SIMPLE-GREEDY as the network evolves. 

INC-GREEDY yields better results than SIMPLE-GREEDY 
at low k' . However, its performance is worse than SIMPLE- 
GREEDY at intermediate fc'-values. This happens as, at each 
iteration, INC-GREEDY chooses to upgrade the BS that 
provides immediate benefits; the benefit lies in converting 
a bottleneck trajectory to 7 -bottleneck-free. This short-term 
focus on immediate satisfaction biases INC-GREEDY to miss 
out on choosing BSes that are beneficial to a larger number of 
trajectories which could have been possibly identified in later 
iterations. _ 

2. Effect of trajectory lengths: Fig. | 8 (a)| shows the ef¬ 
fect of optimizing experiences along short and long tra¬ 
jectories. If the operator chooses to upgrade short trajecto¬ 
ries, SIMPLE-GREEDY approach provides comparable per¬ 
formance as DEC-GREEDY. However, if the long-commuting 
trajectories are to be optimized, DEC-GREEDY approach 
provides substantial gains, especially for small budgets. 

The INC-GREEDY algorithm starts providing better gains 
for lower k' values than SIMPLE-GREEDY. However, DEC- 
GREEDY consistently outperforms the other two algorithms 
across different trajectory le ngths . 

3. Strictness of QoS: Fig. | 8 (b)| shows the effect of relaxing 
the strictness of the QoS provided to the mobile user (reducing 
the value of 7 ). For smaller 7 or for shorter trajectories, 
SIMPLE-GREEDY scheme is comparable to DEC-GREEDY; 
there are a few highly visited BS on the trajectory, which 
are candidates for optimizing in SIMPLE-GREEDY approach, 
which also renders the trajectory 7 -bottleneck-free. However, 
when stricter user experience is needed over longer trajectories, 
DEC-GREEDY provides significant benefits over SIMPLE- 
GREEDY. 


C. City-scale evaluation 

Fig. [9] shows the performance of algorithms in different 
datasets. In the real RM scenario, INC-GREEDY and DEC- 


Fig. 10. Impact of population and runtime: Star topologies provide more 
marginal returns for trajectory optimization. The runtime of INC-GREEDY 
increases as k' increases. However, DEC-GREEDY has a reverse trend. 


GREEDY algorithms consistently provide more than 10-20% 
improvement when the budget is medium to low. In this biased 
distribution, few BS (near MIT) are frequently visited by many 
trajectories. While SIMPLE-GREEDY optimizes frequently 
accessed BS, it fails to optimize the QoE of frequently ac¬ 
cessed trajectories. 

In the simulated datasets, DEC-GREEDY and INC- 
GREEDY constantly outperform SIMPLE-GREEDY. How¬ 
ever, the improvement for low k' is not as pronounced as in 
RM dataset since trajectories are spread across the entire city, 
and hence selecting relevant BS is hard. We also highlight the 
regions of k' < 0.2 in Fig. [9j 

1. Implications of Population Distribution: We observed 
that NYC consistently provides higher marginal gains than 
Atlanta. This is because NYC has a high office density in the 
center, and trajectories form a star-like topology. Upgrading 
a few BS at the center, hence, benefits a large number of 
trajectories. In contrast, Atlanta’s trajectories do not form a 
pronou nced s tar-topology. 

Fig. 1 10(a)| compares the performance of DEC-GREEDY 
in: (1) artificial Star and Mesh topologies, and (2) realistic 
star-like NYC and mesh-like Atlanta, to highlight the impact 
of population distribution. Star-like topologies continuously 
provide significant improvement on small increases in budget. 
Unlike Star, Mesh requires a large budget (fc) to provide good 
performance. For example, for a budget k = 0.2, Star data¬ 
set provides two orders better improvement than Mesh. Hence, 
while rolling out new upgrades in cities, the operator has to 
be cognizant that cost of providing better user experience in a 
star-like city is lower than in a mesh-like city. 

2. Runtime comparison: Fig. 10(b) shows the runtime 
of different algorithms on the Atlanta data-set for 7 = 1 . 
DEC-GREEDY and INC-GREEDY are slower than SIMPLE- 
GREEDY since both these algorithms are iterative. The run¬ 
ning time of INC-GREEDY increases as k increases, whereas 
DEC-GREEDY has a reverse trend. 


VII. Related Work 

To the best of our knowledge, there is no literature that 
incorporates mobility pattern of the users for base-station 
upgrades. The related areas can be broadly classified into 
handoff protocols and radio frequency (RF) planning. 

(1) Handoff protocols: These protocols improve connectivity 
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by associating a device with a better base-station (T9) . They 
fail to improve experience if the base-stations are inherently 
limited in resources (e.g., 2G base-stations). 

Other techniques, such as Cell breathing, dynamically alter 
the coverage area based on the load |[20). While this provides 
high-capacity to static users, it does not explicitly address the 
problem of eliminating bottleneck across trajectories. 

In contrast to the above protocol based approaches, we focus 
on the macro-level problem of base-station network planning. 
(2) RF planning: RF planning techniques optimize the trans¬ 
mission power, frequency, load and location of the base- 
station for providing better coverage and/or capacity @ ED- 
(23). RF planning configures base-stations to ensure adequate 
signal-to-noise ratio (SNR) at the devices, considering the 
interference from neighboring cells (24), (25) . Some radio 
planning also considers frequency allocation and cell coverage 
optimization to cater to mobile users (23) . Most of the RF 
planning tools, such as Atoll |[7), utilize drive tests and carrier 
wave (CW) measurements to recognize the bottlenecks (6). 
Such active measurements inherently limit measuring actual 
user experiences; they cannot scale to measuring the millions 
of subscriber trajectories. Other short-range technologies (5), 
|3j, 1261 are susceptible to frequent handoffs for mobile users. 

In summary, unlike the above studies, we explicitly consider 
macro-cell planning for providing better user-experience along 
users’ trajectories. We use user’s call logs, instead of passive 
measurements, to quantify mobile user experience. 

VIII. Conclusions 

Macro-cell planning for mobile users is a relatively un¬ 
explored problem in the evolution of heterogeneous cellular 
network architectures. This paper addressed the problem of 
performing macro-cell base-station upgrades by accounting for 
the mobile user satisfaction along their trajectories. 

We conducted a measurement-based experiment to show that 
mobile users suffer from degraded quality of experience on 
their everyday commute trajectories. Based on our findings, 
we formulated a generic problem that plans macro-cell up¬ 
grades to optimize mobile user experience. Our formulation 
utilizes active logs to quantify user experience. In addition, 
our formulation enables the operators to plan the upgrades in 
a way that is cognizant of their business needs and constraints. 
This is done by allowing the operators to define key parameters 
such as the budget for upgrades, desired satisfaction level on 
the trajectory, and a micro-segment of the mobile users whose 
mobile experience has to be optimized (e.g., high-value 4G 
customers with long commutes). 

We proved the NP-hardness of the problem, designed two 
approximation algorithms, and proved their approximation 
bounds. We designed a synthetic city and network trace 
generator, and showed the dependence of planning budget 
and algorithm effectiveness in various population distributions. 
Our algorithms consistently enable the operator to achieve 
3-8 times better user experience than simple location-based 
greedy heuristics for the same budget. In future, we plan to 
address problems in “continuous network planning” systems 
that ensure quality of experience for mobile users. 
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